LIMIT THEOREMS FOR SAMPLE EIGENVALUES IN A 
GENERALIZED SPIKED POPULATION MODEL 
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Abstract. In the spiked population model introduced by Johnstone 
[lo| . the population covariance matrix has all its eigenvalues equal to 
unit except for a few fixed eigenvalues (spikes). The question is to 
quantify the effect of the perturbation caused by the spike eigenvalues. 
Baik and Silverstein Q] establishes the almost sure limits of the extreme 
sample eigenvalues associated to the spike eigenvalues when the popu- 
lation and the sample sizes become large. In a recent work [5j], we have 
provided the limiting distributions for these extreme sample eigenvalues. 
In this paper, we extend this theory to a generalized spiked population 
model where the base population covariance matrix is arbitrary, instead 
of the identity matrix as in Johnstone's case. New mathematical tools 
are introduced for establishing the almost sure convergence of the sample 
eigenvalues generated by the spikes. 



1. Introduction 

Let (T p ) be a sequence of p x p non-random and nonnegative definite 
Hermitian matrices and let (wij), i, j > 1 be a doubly infinite array of i.i.d. 
complex- valued random variables satisfying 

E(w n ) = 0, E(\w u \ 2 ) = 1, E(|wn| 4 ) < oo. 
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Write Z n = (wij)i<i< Pt i<j< n , the upper-left p x n bloc, where p = p(n) is 
related to n such that when n — > oo, p/n — ► y > 0. Then the matrix S n = 
^Tp^ 2 Z n Z*Tp^ 2 can be considered as the sample covariance matrix of an 
i.i.d. sample (xi, . . . , x n ) of p-dimensional observation vectors x,- = Tp^ 2 \ij 
where Uj = (wij)i<i<p denotes the j-th column of Z n . Throughout the 
paper, A 1 / 2 stands for any Hermitian square root of an nonnegative definite 
(n.n.d.) Hermitian matrix A. 

Assume that the empirical spectral distribution (ESD) of T p converges 
weakly to a nonrandom probability distribution H on [0, oo). It is then 
well-known that the ESD of S n converges to a nonrandom limiting spectral 



distribution (LSD) G [ll|, LL3] 



Let A nj i > • • • > A„ iP be the set of sample eigenvalues, i.e. the eigenvalues 
of the sample covariance matrix S n . The so-called null case corresponds 
to the situation T p = I p , so that, assuming y < 1, the LSD G reduces 
to the Marcenko-Pastur law with support Tq = [a y , b y ] where a y = (1 — 
^/y) 2 and b y = (1 + \fy) 2 ■ Furthermore, the extreme sample eigenvalues 
A nj i and A niP almost surely tend to b y and a y , respectively, and the sample 
eigenvalues (\ n j) fill completely the interval [a y ,b y ]. However, as pointed 
out by Johnstone [.10;], many empirical data sets demonstrate a significant 
deviation from this null case since some of sample extreme eigenvalues are 
well separated from an inner bulk interval. As a way for possible explanation 
of such phenomenon, Johnstone proposes a spiked population model where 
all eigenvalues of T p are unit except a fixed and relatively small number 
among them (spikes). In other words, the population eigenvalues {f3 n ,j} of 
T p are 




n K 

where M is fixed as well as the multiplicity numbers (n^) which satisfy 
n>i + '" + uk = M. Clearly, this spiked population model can be viewed as 
a finite-rank perturbation of the null case. 

Obviously, the LSD G of S n is not affected by this small perturbation, 
still equals to the Marcenko-Pastur law. However, the asymptotic behavior 
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of the extreme eigenvalues of S n is significantly different from the null case. 
The fluctuation of the largest eigenvalue A nj i in case of complex Gaussian 
variables has been recently studied in Baik et al. [2j. These authors prove 
a transition phenomenon: the weak limit as well as the scaling of A nj i is 
different according to its location with respect to a critical value 1 + yfy. In 
Baik and Silverstein [(J, the authors consider the spiked population model 
with general random variables: complex or real and not necessarily Gauss- 
ian. For the almost sure limits of the extreme sample eigenvalues, they also 
find that these limits depend on the critical values 1 + yjy for largest sample 
eigenvalues, and on 1 — yfy for smallest ones. For example, if there are m 
eigenvalues in the population covariance matrix larger than 1 + y/y, then the 
m largest sample eigenvalues X n> i, ■ ■ ■ , X n ,m will converge to a limit above 
the right edge b v of the limiting Marcenko-Pastur law, see ^4. II for more de- 
tails. In a recent work Bai and Yao [5], considering general random matrices 
as in |6], we have established central limit theorems for these extreme sam- 
ple eigenvalues generated by spike eigenvalues which are outside the critical 
interval [1-^,1 + -^]. 

The spiked population model has also an extension to other random matri- 
ces ensembles through the general concept of small-rank perturbations. The 
goal is again to examine the effect caused on the sample extreme eigenvalues 



3,E 



12, 9, 81, these authors 



by such perturbations. In a series of recent papers 
establish several results in this vein for ensembles of form M n = W n +n~ 1 / 2 V 
where W n is a standard Wigner matrix and V a small-rank matrix. 

The present work is motivated by a generalization of Johnstone's spike 
population model defined as follows. The population covariance matrix T p 
posses two sets of eigenvalues: a small number of them, say called 
generalized spikes, are well separated - in a sense to be defined later-, from 
a base set (/3 n ,i)- I n other words, the spectrum of T p reads as 



Oil, ■ ■ ■ , Q-l ; • • • , CXK, ■ ■ ■ i Q-Jf , Pn,l, ■ ■ ■ , Pn,p-M- 
m n K 
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Therefore, this scheme can be viewed as a finite-rank perturbation of a 
general population covariance matrix with eigenvalues {f3 n ,j}- 

The empirical distributions generated by the eigenvalues ((3 n ,i) will be 
assumed to have a limit distribution H. Note that H is also the LSD of 
T p since the perturbation is of finite rank. Analogous to Johnstone's spiked 
population model, the LSD G of the sample covariance matrix S n is still not 
affected by the spikes. The aim of this work is to identify the effect caused 
by the spikes (a^) on a particular subset of sample eigenvalues. The results 
obtained here extend those of [fj, to the present generalized scheme. 

The remaining sections of the paper are organized as following. £}2] gives 
the precise definition of the generalized spiked population model. Next, we 
use $3]to recall several useful results on the convergence of the E.S.D. from 
general sample covariance matrices. In £J5J we examine the strong point-wise 
convergence of sample eigenvalues associated to spikes. We then establish 
CLT for these sample eigenvalues in fusing the methodology developed in 
Preliminary lemmas and their proofs are gathered in the last section. 

2. Generalized spiked population model 

In a generalized spiked population model, the population covariance ma- 
trix T p takes the form 



where £ and V p are nonnegative and nonrandom Hermitian matrices of 
dimension M xM and p'xp', respectively, where p' = p—M. The submatrix 
£ has K eigenvalues «!>•••> ax > of respective multiplicity (n^), and 
V p has p' eigenvalues f} n> i > ■ ■ ■ > fi n ,p' ■ 

Throughout the paper, we assume that the following assumptions hold. 

(a) Wij, i,j = 1, 2, ... are i.i.d. complex random variables with Ewn = 0, 
.E|u;ii| 2 = 1, and £ , |u;ii| 4 < oo. 

(b) n = n{p) with y n = p' jn -^i/>0asn^oo. 




GENERALIZED SPIKED POPULATION MODEL 5 

(c) The sequence of ESD H n of (T p ), i.e. generated by the population 
eigenvalues {a^, (3 n ,j}, weakly converges to a probability distribution 
H as n — > oo. 

(d) The sequence (||T p ||) of spectral norms of (T p ) is bounded. 

For any measure fj, on R, we denote by the support of //, a close set. 

Definition 2.1. An eigenvalue a of the matrix S is called a generalized 
spike eigenvalue if a ^ Th ■ 

To avoid confusion between spikes and non-spike eigenvalues, we further 
assume that 

(e) max d((3 nj ,T H ) = e n -> 0, 
i<j<p 

where A) denotes the distance of a point x to a set A Note that there 
is a positive constant 5 such that d(ak,Tn) > 5, for all k < K. 

The above definition for generalized spikes is consistent with Johnstone's 
original one of (ordinary) spikes, since in that case we have H n = H = 5^y 
and a ^ Th simply means a / 1. 

1/2 

Let us decompose the observation vectors Xj = T p ' Uj, j = 1, ...,n, 
where Uj = (wij)i<i< p by blocs, 



with ^ = £ 1/2 (u^)i<j< M , ^ = ^, 1/2 (wij)M<i< P - 



Note that both sequences . . . , £„} and {r/ 1 , . . . , ?7 n } are i.i.d. sequences. 
We also denote the coordinates of £i by ^ = (£(1), • • • , £(M)) T . 

Similarly, the sample covariance matrix S n = \T p ^ 2 Z n Z*T p ^ 2 is decom- 
posed as 

XV* V V* 

1^1 ^1^2 



(Sn 


^12l 


H 


U21 


-S'22 / 





with 



^1 = • • • ,Cn)Mxn = -/=£i :n , -^2 = — 7=(»7l, " " " ,Vn)p'xn = —7=Vl:n ■ 

Throughout the paper and for any Hermitian matrix A, we order its 
eigenvalues in an descending order as > > • • • . By definition, the 
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sample eigenvalues {A^ n , 1 < j < p) are solutions to the equation 

(2.1) = |AJ - S n \ = \\I- S 22 \ \XI - K n {\)\ , 
with a random sesquilinear form 

(2.2) K n (\) = S n + S 12 (\I - 5 22 )- 1 5 21 . 

Note that the factorization (|2.ip holds for any A ^ spec(S < 22)- This identity 
will play a central role in our analysis. 

3. Known results on the spectrum of large sample covariance 

matrices 

3.1. Marcenko-Pastur distributions. In this section y is an arbitrary 
positive constant and H an arbitrary probability measure on R + . Define on 
the set 

C + :={zeC : > } , 

the map 

(3.1) g( S )=g y H ( S ) = -l + y J -J—dH(t) , S G C+ . 

It is well-known ([4, Chap. 5]) that g is a one-to-one map from C + onto 
itself, and the inverse map m = g^ 1 corresponds to the Stieltjies transform 
of a probability measure F y> H on [0, oo). Throughout the paper and with 
a small abuse of language, we refer F y h as the Marcenko-Pastur (M.P.) 
distribution with indexes (y,H). 

This family of distributions arises naturally as follows. Consider a com- 
panion matrix S_ n = ^Z*T p Z n of the sample covariance matrix S n . The 
spectra of S n and S_ n are identical except \n— p\ zeros. It is then well-known 



( 



111, [4, Chap. 5]) that under Conditions (a)-(d), the E.S.D. of 5 n converges 



to the M.P. distribution F y ^. The terminology is slightly ambiguous since 
the classical M.P. distribution refers to the limit of the E.S.D. of S n when 
Tp = Ip. 

Note that we shall always extend a function h defined on C + to the real 
axis R by taking the limits lim e ^o + h(x + ie) for real x's whenever these 
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limits exist. For a £ Th and a^O define 

(3.2) ^i(a) = ipy,H(a) ■= g(-l/a) = a + ya J -^—^dH(t) . 

Note that even though this formula could be extended to a = when ^ Th, 
as we will see below that a is related to the — 1/m where m is a Stieltjies 
transform, so that there is no much meaning for a = 0. Therefore, the point 
will always be excluded from the domain of definition of ip. 

Analytical properties of F V) h can be derived from the fundamental equa- 
tion (|3,2p . The following lemma, due to Silverstein and Choi Ijj], character- 
izes the close relationship between the supports of the generating measure 
H and the generated M.P. distribution F y ^H- 



Lemma 3.1. If A ^ Tp y H , then m(A) 7^ and a = — l/m(A) satisfies 

(i) a ^ Th and (so that ip{a) is well- defined); 

(ii) > 0. 

Conversely, if a satisfies (Q)-(jnl) ; then A = ip(a) ^ ^F yH - 

It is then possible to determine the support of F V) h by looking at intervals 
where ip 1 > 0. As an example, Figure [T] displays the function tp for the M.P. 
distribution with indexes y = 0.3 and H the uniform distribution on the set 
{1,4,10}. The function tp is strictly increasing on the following intervals: 
(-00, 0), (0, 0.63), (1.40, 2.57) and (13.19, 00). According to Lemma ED 
we get 

r FyiB DM* = (0, 0.32) U (1.37, 1.67) U (18.00, 00). 
Hence, taking into account that belongs to the support of F y ^n, we have 

T Fy H = {0} U [0.32, 1.37] U [1.67, 18.00]. 



We refer to Bai and Silverstein [3J for a complete account of analytical 
properties of the family of M.P. distributions {F Vj h} and the maps {ip y ,H}- 
In particular, the following conclusions will be useful: 

• when restricted to Tp H , ipy t H has a well-defined inverse function 
ip~H '■ ^F y H ~* F H which is strictly increasing; 
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the family {F yt fj} is continuous in its index parameters (y, H) in a 
wide sense. For example, {iP v ,h} tends to the identity function as 
y^O. 



3.2. Exact separation of sam 

results of Bai and Silverstein 



le eigenvalues. We need first quote two 
I on exact separation of sample eigenval- 
ues. Recall the ESD's (H n ) of (T p ), y n = p/n, and let {Fy n) H n } be the 
sequence of associated M.P. distributions. One should not confuse the M.P. 
distribution {F yn> n n } with the E.S.D. of S_ n although both converge to the 
M.P. distribution F v> h as n — > oo. 

Proposition 3.1. Assume hold Conditions (a)-(d) and the following 

(f) The interval [a, b] with a > lies in an open interval (c, d) outside 
the support of F Vnt H n for all large n. 

Then 

P( no eigenvalue of S n appears in [a,b] for all large n ) = 1. 

Roughly speaking, Proposition 13. II states that a gap in the spectra of the 
Fy n! H n 's is also a gap in the spectrum of S n for large n. Moreover, under 
Condition (f), we know by Lemma 13.11 that for large n, 

By continuity of F Vn ^H n in its indexes, it follows that we have for large n 

^- 1 {[a,b}}=i; y } { {[a,b}}cTj In . 

In other words, it holds almost surely and for large n that, tp~ 1 {[a, b]} con- 
tains no eigenvalue of T p . Let for these n, the integer i n > be such that 

(3.3) T p has exactly i n eigenvalues larger than ?/> -1 (&) . 

Proposition 3.2. Assume Conditions (a)-(d) and (f) hold. Ify[l — H(0)] < 
1, or y[l — H(0)] > 1 but [a, b] is not contained in [0, xq] where xq > is the 
smallest value of the support of F y ^u, then with i n defined in \3. 3\) we have 

P(Af n +1 < a < b < Af n for all large n) = 1. 
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In other words, under these conditions, it happens eventually that the 
numbers of sample eigenvalues {\f n } in both sides of [a, b] match exactly the 
numbers of populations eigenvalues {ttk,(3 n ,j} in both sides of the interval 



4. Almost sure convergence of sample eigenvalues from 

generalized spikes 

From (|3.2p . we have 



Therefore, when a approaches the boundary of the support of H, ip'( a ) 
tends to — oo, see also Figure [TJ Moreover, tp' is concave on any interval 
outside Th- 

As we will see, the asymptotic behavior of the sample eigenvalues gener- 
ated by a generalized spike eigenvalue a depends on the sign of i/j'(a). 

Definition 4.1. We call a generalized spike eigenvalue a, a distant spike 
for the M.P. law F y ^ if tp'(a) > 0, and a close spike if ip'(a) < 0. 

Recall that tjj depend on the parameters (y,H). When H is fixed, and 
since ip tends to the identity function as y — > 0, a close spike for a given 
M.P. law Fy t H becomes a distant spike for M.P. law F Vt H for small enough 



As an example, different types of spikes are displayed in Figure [2j The 
solid curve corresponds to a zoomed view of "00.3,// of Figure [TJ For 
the three values or, «2 and 05 are close spikes; each small enough a (close 
to zero), or large enough a (not displayed), or a value between u and v (see 
the figure) is a distant spike. Furthermore, as y decreases from 0.3 to 0.02 
(dashed curve), ct\, ai and 0:5 become all distant spikes. 

Throughout this section, for each spike eigenvalue a&, we denote by 
+ 1, . . . , ffc + nk the descending ranks of among the eigenvalues of 
T p (multiplicities of eigenvalues are counted): in other words, there are 
eigenvalues of T p larger than and p — fk ~ n k l ess - 
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Theorem 4.1. Assume that the conditions (a)-(e) hold. Let a k be a gen- 
eralized spike eigenvalue of multiplicity satisfying ip'{ak) > (distant 
spike) with descending ranks i/& + 1, • • • , + Then, the n& consecutive 
sample eigenvalues {Xf n }, i = v k + 1, . . . , + n& converge almost surely to 

Proof. Recall Figure [2] of the ip function, for each distant spike there is 
an interval (u}~,vi~) such that 

• ip'(iik) = ip'(vk) = 0; 

• tp'(a) > for all a 6 (uk,Vk). 

Here we make the convention that = oo if ip'(a) > for all a > a& and 
tifc = if ^'(a) > for all a G (0, 

Recall that the support of Fy n ,H„ is determined by 
(4.1) 



V>n(«) = ^ n ,Jf„(a) = 1 -Vn 



p j \ a _ t) 2 dH n(t) + p E( a _ aj .) 



where = ^ fy nJ is the ESD of ^- 

Let = min(ufc, a/ c _i) if > 1 and = Vf. otherwise. Choose v,v' and 
a' u , a u such that a>k < a' u < a u < v < v' < % . By condition (e) , all eigen- 
values of T p will keep away from the interval (a' u ,v') for all large n. Thus, 
ip' n (ct) — ► ip'(a) > uniformly on the interval [a4,t>']. Hence, the interval 
(ip(a' u ), ip(v')) will be out of the support of F ynj H n f° r an large n. Conse- 
quently, the interval [ip(a u ),ip(v)] satisfies the conditions of Proposition [37 
with i n = Vk- Therefore, by Proposition 13.21 we have 

P(\tl +1 < i>{a u ) <i){v)<\%, for all large n) = 1 if v h > 0; 
-P(A^ n +1 < ijj(a u ), for all large n) = 1 otherwise. 

Therefore, it holds almost surely 

limsupA^ n +1 <ip{a u ), 
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and finally, letting a u — ► a k , 

(4.2) limsu P Aj +1 < iP{a k ). 

n 

Similarly, one can prove that for any Uk < u < ol\ < a k , 
^ P(\^+n k +i < Hu) < V?(«i) < Aj +n& , for all large n) = l iiv k +n h < p, 

P{\y2+n k — ' l P{ OL i)i f° r an large n) = 1 otherwise, 
where u k = max(u;%, a k +i) i£ k < K and u k = u k otherwise. 

Consequently, 

(4.3) hniinfAj +nfc >^(a k ). 
Thus, we proved that almost surely, 

lim A^ n +j = rp(a k ), for j = 1, • • • , n k . 
The proof of Theorem 14.11 is complete. □ 
Next we consider close spikes. 

Theorem 4.2. Assume that the conditions (a)-(e) hold. Let a k be a gener- 
alized spike eigenvalue of multiplicity n k satisfying ip'(oik) < (close spike) 
with descending ranks v k + 1, . . . , v k + n k . Let L be the maximal interval in 
T C H containing a k . 

(i) Lf L has a sub-interval (uk,v k ) on which ij)' > (then we take this 
interval to be maximal), then the n k sample eigenvalues {A^™}, j = 
u k + 1, . . . , v k + n k converge almost surely to the number ip(w) where 
w is one of the endpoints {uk,v k } nearest to a k ; 

(ii) If for all a £ I, ip'(a) < 0, then the n k sample eigenvalues {A^' 1 }, 
j = v k + 1, . . . , ft + n-k converge almost surely to the ^-th quantile of 
G, the L.S.D. of S n , where 7 = H(0,ak). 

Proof. The proof refers to the curves of Figure [2j 

(i). Suppose a k is a spike eigenvalue satisfying ip'(ctk) < and there is 

an interval (uk,Vk) C / on which tp' > (q^ is like the ot\ on the figure). 

According to Lemma [3. H ip{(uk, Vk)} C T C F and ip(uk) is a boundary 
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point of the support of G, the L.S.D. of S n . Without loss of generality, we 
can assume a k < u k , the argument of the other situation where a k > v k 
being similar. 

Choose u k < ot u < v < v (v = min(v k ,a k ^i) or v k in accordance with 
k > 1 or not) such that (a u ,v) C /, by the argument used in the proof of 
Theorem 14.11 one can prove that 

f(A^+i < ip(u u ) < 4>(v) < \*%, for all large n) = 1 if v k > 0; 
P{\ v 2+x < ^ip-u)-) f° r an large n) = 1 otherwise. 

This proves that almost surely, 

limsupAf™ +1 < ip(u k ) < liminf A^™ • 

On the other hand, since ip(u k ) is a boundary point of the support of G, 
we know that for any e > 0, almost surely, the number of X i n 's falling into 
[ip(u k ) — e,ip(uk)] tends to infinity. Therefore, 

liminf A^" +rtfe+1 > ip(u k ) - e, a.s.. 

Since e is arbitrary, we have finally proved that almost surely, 

lim X^ +j = ip(u k ), j = !,-■■ ,n k . 

Thus, the proof of Conclusion (i) of Theorem 14.21 is complete. 

Similarly, if the spiked eigenvalue a k is like «2, we can show that the n k 
corresponding eigenvalues of S n goes to t/j(v k ). 

(ii) If the spiked eigenvalues is like 05, where the gap of support of LSD 
disappeared, clearly the corresponding sample eigenvalues A„ fc +i, . . . , \ Uk+rik 
tend to the 7-th quantile of the LSD of S n where 

7 = 1 — lim — = H(0, a k ). 
v k 



□ 
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4.1. Case of Johnstone's spiked population model. In the case of 
Johnstone's model, H reduces to the Dirac mass 5± and the LSD G equals 
the Marcenko-Pastur law with T^j = [a y ,b y ]. Each a > 0, a ^ 1 is then a 
spike eigenvalue. The associated function ip in (|3.2p becomes 

(4.4) ^ ak) = ak + J^. 

The function ip has the following properties, see Figure [3) 

• its range equals (— do, a y ] U [b y , oo) ; 

• - \/y) = a v ' VK 1 + = ^; 

• ^'(a) > O \a - 1| > y/y. 

Therefore, by Theorem I4.1|, for any spike eigenvalue satisfying > 1 + y/y 
(large enough) or au < 1 — y/y (small enough), there is a packet of 
consecutive eigenvalues {X n ,j} converging almost surely to VK a fc) ^ \ a yi^y\- 
In other words, assume there are exactly K\ spikes greater than 1 + y/y and 
K<i spikes smaller than 1 — y/y. By Theorems 14.11 and 14.21 we conclude that 

(i) the N\ := nx + ... + Uk x largest eigenvalues {A^ n }, j = 1,...,N\ 
tend to their respective limits {ip(ct}.)}, k = 1, . . . , K\ ; 

(ii) the immediately following largest eigenvalue A^ +1 tends to the right 
edge b y ] 

(hi) the N 2 := tlk + • • • + nK-K 2 +l smallest sample eigenvalues {A "_„•}, 
j = 0,...,N 2 — 1 tend to their respective limits {ip(ak)}, k = 
K,...,K-K 2 + 1; 

(iv) the immediately following smallest eigenvalue A " N ^ tends to the left 
edge a y . 

Hence we have recovered the content of Theorem 1.1 of [{j]. 

4.2. An example of generalized spike eigenvalues. Assume that T p is 
diagonal with three base eigenvalues {1,4, 10}, nearly p/3 times for each of 
them, and there are four spike eigenvalues (ati, a 2 , «3, 014) = (15, 6, 2, 0.5), 
with respective multiplicities (n^.) = (3,2,2,2). The limiting population- 
sample ratio is taken to be y = 0.3. The limiting population spectrum H 
is then the uniform distribution on {1,4, 10}. The support of the limiting 
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Marcenko-Pastur distribution ^0.3,^ contains two intervals [0.32, 1.37] and 
[1.67, 18], see ^3.1i The ^-function of (|3.2[) for the current case is displayed 
in Figure [TJ For simulation, we use p' = 600 so that T p has the following 
609 eigenvalues: 

15, 15, 15, 10, ... , 10, 6, 6, 4, . . . , 4, 2, 2, 1, . . . , 1, 0.5, 0.5 . 
200 200 200 

From the table 



spike ctk 


15 


6 


2 


0.5 


multiplicity n k 


3 


2 


2 


2 




+ 




+ 




i/)(atk) 


18.65 


5.82 


1.55 


0.29 


descending ranks 


1, 2, 3 


204, 205 


406, 407 


608, 609 



we see that 6 is a close spike for H while the three others are distant ones. 
By Theorems 14.11 and 14.21 we know that 

• the 7 sample eigenvalues A^ n with j £ {1, 2, 3, 406, 407, 608, 609} 
associated to distant spikes tend to 18.65, 1.55 and 0.29, respectively, 
which are located outside the support of limiting distribution Fq^h 
(or GQ; 

• the two sample eigenvalues A ■ n with j = 204, 205 associated to the 
close spike 6 tend to a limit located inside the support, the 7-th 
quantile of the limiting distribution G where 7 = H(0, 6) = 2/3. 

There facts are illustrated by a simulation sample displayed in Figure 01 

5. CLT FOR SAMPLE EIGENVALUES FROM DISTANT GENERALIZED SPIKES 

Following Theorem 14.11 to any distant generalized spike eigenvalue a k , 
there is a packet of n k consecutive sample eigenvalues {A^ n : j £ 4} 
converging to ip(ctk) £ Tg where Jfc are the descending ranks of among 
the eigenvalues of T p (counting multiplicities). The aim of this section is to 
derive a CLT for n^-dimensional vector 

v^jAf 1 - ip(a k )} , jeJ k . 
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The method follows Bai and Yao [5| which considers Johnstone's spiked 
population model. Consider the random form K n introduced in (|2.2|) and 
let 

(5.1) A n = (ay) = A n {\) = X* 2 (XI - X 2 X$)- 1 X 2 , X £ F G . 

By Lemma I6.2| detailed in £j6l we know that n~ l trA n , n~~ 1 trA n A^ and 
71 1 Y17=i a ii converge, almost surely or in probability, to ymi(A), ym,2{X) 
and (y[l + mi(A)]/{A — y[l + ?7ii(A)]}) 2 , respectively. Here, the rrij(X) are 
some specific transforms of the LSD G (see $6]). 

Therefore, the random form K n in (|2.2p can be decomposed as follows 



K n (\) = S ll +X 1 A n X* 1 =-i 1 .. n {I + A n )H, n 

n 

= - tfi:n{I + A n )d. n - Str(/ + A n )} + -Str(/ + A r 
n ' n 

L R n + [l+ymi(A)]S + o P (4=), 
n vn 



with 

(5.2) R n = R n (X) = -L {^i:„(/ + A„)C* n - Etr(7 + 



In the last derivation, we have used the fact 

1 1 

-tr(I + An) = 1 + ymi(A) + o P (-=) 
n \/n 



which follows from a CLT for tr(A n ) [see| 

For the statement of our result, we first need to find the limit distribution 
of the sequence of random matrices {R n (X)}. The situation is different for 
the real and complex cases. By applications of Propositions 3.1 and 3.2 in 
we have for A ^ Tg, 

(i) if the variables (wij) are real-valued, the random matrix R n (X) con- 
verges weakly to a symmetric random matrix R(X) = (Rij(X)) with 
zero-mean Gaussian entries having an explicitly known covariance 
function ; 

(ii) if the variables (wij) are complex- valued, the random matrix R n 
converges weakly to a zero-mean Hermitian random matrix R(X) = 
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(Rij(\)). Moreover, the real and imaginary parts of its upper- 
triangular bloc {Rij(X), 1 < i < j < M} form a 21^-dimensional 
Gaussian vector with an explicitly known covariance matrix. 

We are in order to introduce our CLT. Let the spectral decomposition of 



(5.3) £ = U 



f a x I ni ■■■ N 

'•• 
y • • • a K In K ) 



U* 



where U is an unitary matrix. Let ipk = ip{ak) an d R(ipk) be the weak 
Gaussian limit of the sequence of matrices of random forms [-R n (V'fc)]n re- 
called above (in both real and complex variables case). Let 

(5.4) Rty k ) = U*R(4> k )U . 

Theorem 5.3. For each distant generalize spike eigenvalue, the dimensional 
real vector 

V^Af 1 - Vfc, j G Jk} , 

converges weakly to the distribution of the n k eigenvalues of the Gaussian 
random matrix 

1 



1 + ym 3 (ip k )a k 



Rkk{i>k)- 



where Rkki^k) is the k-th diagonal block of R(ipk) corresponding to the in- 
dexes {u, v E Jk}- 

It is worth noticing that the limiting distribution of such n& packed sample 
extreme eigenvalues are generally non Gaussian and asymptotically depen- 
dent. Indeed, the limiting distribution of a single sample extreme eigenvalue 
\j n is Gaussian if and only if the corresponding generalized spike eigenvalue 
is simple. We refer the reader to [5] for detailed examples illustrating these 
same facts but for Johnstone's model. 



generalized spiked population model 17 
6. Lemmas 

For A ^ Tq, we define 

mi(A) = j ^L_dG(x), 
x 2 



m2 ^ = I (X-x) 2 <iG ^ ' 
™s(A) = / M X „ dG(x) . 



(A-x) 2 

The following lemma gives the law of large numbers for some useful statistics 
of A n defined in (|5.ip . We omit its proof because it is a straightforward 
extension of Lemma 6.1 of [5], related to Johnstone's spiked population 
model, to the present generalized spiked population model. 

Lemma 6.2. Under the assumptions of Theorem \4-l\ for all A G [o.,b], we 
have 



(6.1) -trA n ^ ymx(A) , 
n 

(6.2) -trA n A* n ym 2 (X) , 
n 



U.S. 



2 O.8. / y[l + mi(A)] N 2 



71^°'* VA-y[l + m!(A)] 

Lemma 6.3. For all A £ [a, 6], lf n (A) converges almost surely to the con- 
stant matrix [1 + ymi(A)]E. 

Proof. The random form in (|2.2fl can be decomposed as follows 

K n {X) = S ll +X l A n X{ = -(^,...^ n )(I + A n )(^,...^ n y. 

n 

Define M be the event that S22 has no eigenvalues in the interval [a', b'] 
which satisfies [a, b] C (a', b') and [a', b'] C (c, cf). On the event M, the norm 
of A n is bounded by maxj^p-, 57^3}- By independence, it is easy to show 
that 

1 



1? 



{(ui, ...,«„)(/ + A n ){u u u n yi M ~ [tr(J + A n )]/ M } ™' 0. 



By proposition 13. l\ I m — ► l,a.s.. Thus 

(6.4) D„(A) = o a . s .(l) + +[-tr(/ + A n )]S/ M a 4- (l + ymi(A))S, 

n 
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where the last step follows from (16.11) , The conclusion follows. 



□ 
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The Psi function 
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Figure 1. The ifi function for the Marcenko-Pastur dis- 
tribution Fq^^h with H the uniform distribution on the set 
{1,4,10}. Blue points indicate intervals where ip' > 0. Sin- 
gular points of ip are indicated as vertical lines corresponding 
to the support of H. On the left, the support set of Fq^^h 
(except the point 0) and its complementary set are indicated 
as magenta and blue segments respectively. 
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A zoomed view of Psi functions 
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Figure 2. A zoomed view of the ip functions for the 
Marcenko-Pastur distribution -Fo.3,# (solid curve) and Fq$2,h 
(dashed curve) with H the uniform distribution on the set 
{1,4, 10}. The three points ai, «2 and 05 are close spikes for 
Fo.3,h where ip' 3 H < 0. They become all distant spikes for 
Fom,H as ^q.02,h > °- 
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Figure 3. The function a \— ► ip{a) = a + ya/{a — 1) which 
maps a spike eigenvalue a to the limit of an associated sam- 
ple eigenvalue in Johnstone's spiked population model. Fig- 
ure with y = \- [1 T y/y\ = [0.293, 1.707]; [(1 T y^) 2 ] = 
[0.086, 2.914] . 
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Figure 4. An example of p = 609 sample eigenvalues (a), 
and two zoomed views (b) and (c) on [5,7] and [0,2] re- 
spectively. The limiting distribution of the E.S.D has sup- 
port [0.32,1.37] U [1.67,18.00]. The 9 sample eigenvalues 
{A?», j = 1,2,3,204,205,406,407,608,609 } associated to 
the spikes are marked with a blue point. Gaussian entries. 



